Gibbs sampling and helix-cap motifs
نویسندگان
چکیده
Protein backbones have characteristic secondary structures, including alpha-helices and beta-sheets. Which structure is adopted locally is strongly biased by the local amino acid sequence of the protein. Accurate (probabilistic) mappings from sequence to structure are valuable for both secondary-structure prediction and protein design. For the case of alpha-helix caps, we test whether the information content of the sequence-structure mapping can be self-consistently improved by using a relaxed definition of the structure. We derive helix-cap sequence motifs using database helix assignments for proteins of known structure. These motifs are refined using Gibbs sampling in competition with a null motif. Then Gibbs sampling is repeated, allowing for frameshifts of +/-1 amino acid residue, in order to find sequence motifs of higher total information content. All helix-cap motifs were found to have good generalization capability, as judged by training on a small set of non-redundant proteins and testing on a larger set. For overall prediction purposes, frameshift motifs using all training examples yielded the best results. Frameshift motifs using a fraction of all training examples performed best in terms of true positives among top predictions. However, motifs without frameshifts also performed well, despite a roughly one-third lower total information content.
منابع مشابه
A Bayesian Insertion/Deletion Algorithm for Distant Protein Motif Searching via Entropy Filtering
Bayesian models have been developed that nd ungapped motifs in multiple protein sequences. In this article, we extend the model to allow for deletions and insertions in motifs. Direct generalization of the ungapped algorithm, based on Gibbs sampling, proved unsuccessful because the con guration space became much larger. To alleviate the convergence dif culty, a two-stage procedure is introd...
متن کاملGibbs motif sampling: detection of bacterial outer membrane protein repeats.
The detection and alignment of locally conserved regions (motifs) in multiple sequences can provide insight into protein structure, function, and evolution. A new Gibbs sampling algorithm is described that detects motif-encoding regions in sequences and optimally partitions them into distinct motif models; this is illustrated using a set of immunoglobulin fold proteins. When applied to sequence...
متن کاملBayesian Models and Markov Chain Monte Carlo Methods for Protein Motifs with the Secondary Characteristics
Statistical methods have been developed for finding local patterns, also called motifs, in multiple protein sequences. The aligned segments may imply functional or structural core regions. However, the existing methods often have difficulties in aligning multiple proteins when sequence residue identities are low (e.g., less than 25%). In this article, we develop a Bayesian model and Markov chai...
متن کاملStructure-based prediction reveals capping motifs that inhibit β-helix aggregation.
The parallel β-helix is a geometrically regular fold commonly found in the proteomes of bacteria, viruses, fungi, archaea, and some vertebrates. β-helix structure has been observed in monomeric units of some aggregated amyloid fibers. In contrast, soluble β-helices, both right- and left-handed, are usually "capped" on each end by one or more secondary structures. Here, an in-depth classificatio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Nucleic Acids Research
دوره 33 شماره
صفحات -
تاریخ انتشار 2005